{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/jeffheaton/app_deep_learning/blob/main/t81_558_class_08_3_keras_hyperparameters.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# T81-558: Applications of Deep Neural Networks\n",
    "**Module 8: Kaggle Data Sets**\n",
    "* Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), McKelvey School of Engineering, [Washington University in St. Louis](https://engineering.wustl.edu/Programs/Pages/default.aspx)\n",
    "* For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Module 8 Material\n",
    "\n",
    "* Part 8.1: Introduction to Kaggle [[Video]](https://www.youtube.com/watch?v=7Mk46fb0Ayg&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](t81_558_class_08_1_kaggle_intro.ipynb)\n",
    "* Part 8.2: Building Ensembles with Scikit-Learn and PyTorch [[Video]](https://www.youtube.com/watch?v=przbLRCRL24&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](t81_558_class_08_2_pytorch_ensembles.ipynb)\n",
    "* **Part 8.3: How Should you Architect Your PyTorch Neural Network: Hyperparameters** [[Video]](https://www.youtube.com/watch?v=YTL2BR4U2Ng&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](t81_558_class_08_3_pytorch_hyperparameters.ipynb)\n",
    "* Part 8.4: Bayesian Hyperparameter Optimization for PyTorch [[Video]](https://www.youtube.com/watch?v=1f4psgAcefU&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](t81_558_class_08_4_bayesian_hyperparameter_opt.ipynb)\n",
    "* Part 8.5: Current Semester's Kaggle [[Video]] [[Notebook]](t81_558_class_08_5_kaggle_project.ipynb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Google CoLab Instructions\n",
    "\n",
    "The following code ensures that Google CoLab is running the correct version of TensorFlow."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Note: not using Google CoLab\n"
     ]
    }
   ],
   "source": [
    "# Startup CoLab\n",
    "try:\n",
    "    import google.colab\n",
    "    COLAB = True\n",
    "    print(\"Note: using Google CoLab\")\n",
    "except:\n",
    "    print(\"Note: not using Google CoLab\")\n",
    "    COLAB = False\n",
    "\n",
    "\n",
    "# Nicely formatted time string\n",
    "def hms_string(sec_elapsed):\n",
    "    h = int(sec_elapsed / (60 * 60))\n",
    "    m = int((sec_elapsed % (60 * 60)) / 60)\n",
    "    s = sec_elapsed % 60\n",
    "    return \"{}:{:>02}:{:>05.2f}\".format(h, m, s)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Part 8.3: Architecting Network: Hyperparameters\n",
    "\n",
    "You have probably noticed several hyperparameters introduced previously in this course that you need to choose for your neural network. The number of layers, neuron counts per layer, layer types, and activation functions are all choices you must make to optimize your neural network. Some of the categories of hyperparameters for you to choose from coming from the following list:\n",
    "\n",
    "* Number of Hidden Layers and Neuron Counts\n",
    "* Activation Functions\n",
    "* Advanced Activation Functions\n",
    "* Regularization: L1, L2, Dropout\n",
    "* Batch Normalization\n",
    "* Training Parameters\n",
    "\n",
    "The following sections will introduce each of these categories for PyTorch. While I will provide some general guidelines for hyperparameter selection, no two tasks are the same. You will benefit from experimentation with these values to determine what works best for your neural network. In the next part, we will see how machine learning can select some of these values independently.\n",
    "\n",
    "## Number of Hidden Layers and Neuron Counts\n",
    "\n",
    "The structure of PyTorch layers is perhaps the hyperparameters that most become aware of first. How many layers should you have? How many neurons are on each layer? What activation function and layer type should you use? These are all questions that come up when designing a neural network. There are many different [types of layer](https://pytorch.org/docs/stable/nn.html) in PyTorch, listed here:\n",
    "\n",
    "* **Activation** - PyTorch allows you to add activation functions using torch.nn modules. Instead of an activation layer, you typically specify the activation function directly after a Linear (or other) layer type.\n",
    "* **Regularization** For L1/L2 regularization in PyTorch, you generally don't use a separate layer. Instead, you can add weight decay when setting up an optimizer like SGD or Adam. This works as an L2 regularization. For L1, you might need to implement it manually.\n",
    "* **Linear** - The original neural network layer type. In this layer type, every neuron connects to the next layer. The input vector is one-dimensional, and placing specific inputs next does not affect each other. \n",
    "* **Dropout** -  It operates by randomly setting a fraction of input units to 0 at each forward pass, which helps in preventing overfitting. In PyTorch, Dropout is applied during training only by default.\n",
    "* **Flatten** - Flattens the input to 1D and does not affect the batch size.\n",
    "* **Permute** - PyTorch tensors have a permute method that can be used to rearrange the dimensions of a tensor, which is useful when working with different types of layers that expect certain input shapes and for tasks such as connecting RNNs and convolutional networks.\n",
    "* **RepeatVector** - Repeats the input n times.\n",
    " \n",
    "There is always trial and error for choosing a good number of neurons and hidden layers. Generally, the number of neurons on each layer will be larger closer to the hidden layer and smaller towards the output layer. This configuration gives the neural network a somewhat triangular or trapezoid appearance.\n",
    "\n",
    "## Activation Functions\n",
    "\n",
    "Activation functions are a choice that you must make for each layer. Generally, you can follow this guideline:\n",
    "* Hidden Layers - RELU\n",
    "* Output Layer - Softmax for classification, linear for regression.\n",
    "\n",
    "Some of the common activation functions in PyTorch are listed here:\n",
    "\n",
    "* **softmax** - Used for multi-class classification.  Ensures all output neurons behave as probabilities and sum to 1.0.\n",
    "* **elu** - Exponential linear unit.  Exponential Linear Unit or its widely known name ELU is a function that tends to converge cost to zero faster and produce more accurate results. Can produce negative outputs.\n",
    "* **selu** - Scaled Exponential Linear Unit (SELU), essentially **elu** multiplied by a scaling constant.\n",
    "* **softplus** - Softplus activation function. $log(exp(x) + 1)$  [Introduced](https://papers.nips.cc/paper/1920-incorporating-second-order-functional-knowledge-for-better-option-pricing.pdf) in 2001.\n",
    "* **softsign** Softsign activation function. $x / (abs(x) + 1)$ Similar to tanh, but not widely used.\n",
    "* **relu** - Very popular neural network activation function.  Used for hidden layers, cannot output negative values. No trainable parameters.\n",
    "* **tanh** Classic neural network activation function, though often replaced by relu family on modern networks.\n",
    "* **sigmoid** - Classic neural network activation.  Often used on output layer of a binary classifier.\n",
    "* **hard_sigmoid** - Less computationally expensive variant of sigmoid.\n",
    "* **exp** - Exponential (base e) activation function.\n",
    "\n",
    "For more information about PyTorch activation functions refer to the following:\n",
    "\n",
    "* [PyTorch Activation Functions](https://pytorch.org/docs/stable/nn.html)\n",
    "* [Activation Function Cheat Sheets](https://pytorch.org/docs/stable/nn.html)\n",
    "\n",
    "\n",
    "## Batch Normalization and Dropout\n",
    "\n",
    "* [PyTorch Dropout](https://pytorch.org/docs/stable/generated/torch.nn.Dropout.html)\n",
    "* [PyTorch Batch Normalization](https://pytorch.org/docs/stable/generated/torch.nn.functional.batch_norm.html)\n",
    "\n",
    "* Ioffe, S., & Szegedy, C. (2015). [Batch normalization: Accelerating deep network training by reducing internal covariate shift](https://arxiv.org/abs/1502.03167). *arXiv preprint arXiv:1502.03167*.\n",
    "\n",
    "Normalize the activations of the previous layer at each batch, i.e. applies a transformation that maintains the mean activation close to 0 and the activation standard deviation close to 1. Can allow learning rate to be larger.\n",
    "\n",
    "\n",
    "## Training Parameters\n",
    "\n",
    "* [PyTorch Optimizers](https://pytorch.org/docs/stable/optim.html)\n",
    "\n",
    "* **Batch Size** - Usually small, such as 32 or so.\n",
    "* **Learning Rate**  - Usually small, 1e-3 or so.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3.9 (torch)",
   "language": "python",
   "name": "pytorch"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.16"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
